Econometrics - Lecture 5

Multiple Linear Regression

Author

Logan Kelly, Ph.D.

Published

January 28, 2025

1 Introduction

  • In this lecture, we extend the simple linear regression framework to include multiple predictors.
  • This approach allows us to understand the impact of each independent variable on the dependent variable while controlling for other factors.
  • By adding more predictors, we can better capture the complexity of real-world relationships and reduce omitted variable bias.

2 Key Concepts

  • Adding More Predictors
  • Partial Effects (Controlling for Other Variables)
  • Interpretation Nuances (Coefficients in Multiple Regression)
  • Model Selection & Fit Criteria

3 Theoretical Discussion

A multiple linear regression model with two predictors can be written as:

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \varepsilon \]

  • \(y\): The dependent variable
  • \(x_1, x_2, \dots\): Independent variables (predictors)
  • \(\beta_0\): The intercept, representing the expected value of \(y\) when all predictors are zero
  • \(\beta_1, \beta_2, \dots\): Slope coefficients, each indicating how \(y\) changes with a one-unit increase in the corresponding predictor, holding other variables constant
  • \(\varepsilon\): The error term, assumed to have a mean of zero if the model is correctly specified

3.1 Partial Effects and Controlling for Other Variables

  • Partial Effect: In a multiple regression, \(\beta_1\) represents the expected change in \(y\) for a one-unit increase in \(x_1\), holding \(x_2, x_3, \dots\) constant.
  • Mathematically, each coefficient is the change in y divided by the chnge in \(x_j\), i.e. \[\beta_j = \frac{\partial y}{\partial x_j}\]
  • Importance of Controlling: By including additional predictors, we can isolate the effect of each variable and reduce omitted variable bias.

3.2 Interpretation Nuances

  • Coefficient Magnitude: Each \(\beta_j\) shows how \(y\) changes with \(x_j\), keeping other predictors fixed.
  • Significance and p-values: Determine if \(\beta_j\) is significantly different from zero.
  • Multicollinearity: Highly correlated predictors can inflate standard errors, complicating interpretation.
  • Model Fit: Metrics like Adjusted R-squared become more relevant when comparing models with different numbers of predictors.

Model selection involves determining which predictors to include in a multiple regression model, balancing model performance with interpretability. Below is an overview of common techniques and considerations for selecting and evaluating models.

3.3 Forward, Backward, Stepwise Selection

  • Forward. Begin with no predictors, adding them one at a time based on a selection criterion (for instance, a p-value threshold or an information criterion).
  • Backward. Start with all candidate predictors, removing them one at a time, typically removing the least significant predictor at each step.
  • Stepwise. Combine forward and backward methods by iteratively adding or removing predictors, attempting to find an optimal subset.

3.4 Pro’s and Con’s

  • Pros of these approaches include the automated narrowing of large predictor sets.
  • However, can be sensitive to the order in which variables enter or leave the model
  • potentially result in overfitting.

3.5 Model Selection & Fit Criteria

Model selection involves determining which predictors to include in a multiple regression model, balancing model performance with interpretability. Below is an overview of common techniques and considerations for selecting and evaluating models.

  • Forward, Backward, Stepwise Selection

    • Forward: Begin with no predictors, adding them one at a time based on a selection criterion such as a p-value threshold or an information criterion like AIC.

    • Backward: Start with all candidate predictors, removing them one at a time, typically removing the least significant predictor at each step.

    • Stepwise: Combine forward and backward methods by iteratively adding or removing predictors, attempting to find an optimal subset.

    • Pros: Automated procedure, helps narrow down large sets of predictors.

    • Cons: Can overlook important predictors or retain irrelevant ones, sensitive to the order of entry or removal, may lead to overfitting.

  • AIC and BIC (Information Criteria)

    Information criteria provide a way to compare model fit while penalizing excessive complexity. They are based on the log-likelihood function of the model.

    The Log-Likelihood Function for a multiple linear regression model assuming normally distributed errors is:

    \[ \ln(L) = -\frac{n}{2} \ln(2\pi\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (y_i - \hat{y}_i)^2 \]

    where

    • \(n\) is the number of observations,
    • \(\sigma^2\) is the variance of the error term,
    • \(\hat{y}_i\) is the predicted value of \(y_i\).

    The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are defined as:

    \[ \text{AIC} = 2k - 2\ln(\hat{L}) \]

    \[ \text{BIC} = k\ln(n) - 2\ln(\hat{L}) \]

    where

    • \(k\) is the number of parameters in the model,
    • \(\ln(\hat{L})\) is the maximized value of the log likelihood function for the model,
    • \(n\) is the sample size.

Use Cases:

  • AIC is generally preferred when the primary goal is to selecting a structural model. It balances model fit with complexity but imposes a lighter penalty for additional parameters, making it suitable for scenarios where overfitting is less of a concern than omited variable bias.

  • BIC is often favored when the goal is prediction. It imposes a heavier penalty for additional parameters, which can be advantageous in avoiding overfitting and selecting simpler models.

  • Comparing Models: When comparing models that predict the same outcome variable, a difference of about 2–7 points in AIC or BIC indicates moderate evidence in favor of the model with the lower score.

Practical Tips

  • Do not rely solely on p-values. Incorporate theoretical considerations, domain expertise, and additional metrics such as adjusted R-squared, AIC, or BIC.

  • Check diagnostics. Even models that look good on paper can fail if they violate OLS assumptions such as linearity or homoscedasticity.

  • Aim for parsimony. Strive for a balance between simplicity and explanatory power to avoid overfitting and to keep the model interpretable.

4 Case Study

We will perform a multiple regression analysis using the mtcars dataset to predict mpg based on all other variables. The steps include loading the data, estimating the model, selecting the best model using AIC and BIC, and visualizing the residuals.

Step 1: Load Data

Code
# Load necessary libraries
pacman::p_load(olsrr, ggplot2)

# Load the mtcars dataset
data(mtcars)

Step 2: Estimate the Full Model

Estimate a multiple linear regression model with mpg as the dependent variable and all other variables as independent variables.

Code
# Fit the full multiple linear regression model
full_model <- lm(mpg ~ ., data = mtcars)

# View the summary of the full model
summary(full_model)

Call:
lm(formula = mpg ~ ., data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4506 -1.6044 -0.1196  1.2193  4.6271 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) 12.30337   18.71788   0.657   0.5181  
cyl         -0.11144    1.04502  -0.107   0.9161  
disp         0.01334    0.01786   0.747   0.4635  
hp          -0.02148    0.02177  -0.987   0.3350  
drat         0.78711    1.63537   0.481   0.6353  
wt          -3.71530    1.89441  -1.961   0.0633 .
qsec         0.82104    0.73084   1.123   0.2739  
vs           0.31776    2.10451   0.151   0.8814  
am           2.52023    2.05665   1.225   0.2340  
gear         0.65541    1.49326   0.439   0.6652  
carb        -0.19942    0.82875  -0.241   0.8122  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.65 on 21 degrees of freedom
Multiple R-squared:  0.869, Adjusted R-squared:  0.8066 
F-statistic: 13.93 on 10 and 21 DF,  p-value: 3.793e-07

Step 3: Model Selection Using AIC and BIC

Use the olsrr package to perform model selection based on AIC and BIC criteria.

AIC Model

Code
# Forward selection based on AIC
model_aic <-  ols_step_both_aic(full_model, details = FALSE)
model_aic

                             Stepwise Summary                              
-------------------------------------------------------------------------
Step    Variable        AIC        SBC       SBIC        R2       Adj. R2 
-------------------------------------------------------------------------
 0      Base Model    208.756    211.687    115.061    0.00000    0.00000 
 1      wt (+)        166.029    170.427     74.373    0.75283    0.74459 
 2      cyl (+)       156.010    161.873     66.190    0.83023    0.81852 
 3      hp (+)        155.477    162.805     66.696    0.84315    0.82634 
-------------------------------------------------------------------------

Final Model Output 
------------------

                         Model Summary                          
---------------------------------------------------------------
R                       0.918       RMSE                 2.349 
R-Squared               0.843       MSE                  5.519 
Adj. R-Squared          0.826       Coef. Var           12.501 
Pred R-Squared          0.796       AIC                155.477 
MAE                     1.845       SBC                162.805 
---------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                               ANOVA                                 
--------------------------------------------------------------------
                Sum of                                              
               Squares        DF    Mean Square      F         Sig. 
--------------------------------------------------------------------
Regression     949.427         3        316.476    50.171    0.0000 
Residual       176.621        28          6.308                     
Total         1126.047        31                                    
--------------------------------------------------------------------

                                  Parameter Estimates                                    
----------------------------------------------------------------------------------------
      model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
----------------------------------------------------------------------------------------
(Intercept)    38.752         1.787                 21.687    0.000    35.092    42.412 
         wt    -3.167         0.741       -0.514    -4.276    0.000    -4.684    -1.650 
        cyl    -0.942         0.551       -0.279    -1.709    0.098    -2.070     0.187 
         hp    -0.018         0.012       -0.205    -1.519    0.140    -0.042     0.006 
----------------------------------------------------------------------------------------

BIC Model

Code
# Backward selection based on BIC
model_bic <-  ols_step_both_sbc(full_model, details = FALSE)
model_bic

                             Stepwise Summary                              
-------------------------------------------------------------------------
Step    Variable        AIC        SBC       SBIC        R2       Adj. R2 
-------------------------------------------------------------------------
 0      Base Model    208.756    211.687    115.061    0.00000    0.00000 
 1      wt (+)        166.029    170.427     74.373    0.75283    0.74459 
 2      cyl (+)       156.010    161.873     66.190    0.83023    0.81852 
-------------------------------------------------------------------------

Final Model Output 
------------------

                         Model Summary                          
---------------------------------------------------------------
R                       0.911       RMSE                 2.444 
R-Squared               0.830       MSE                  5.974 
Adj. R-Squared          0.819       Coef. Var           12.780 
Pred R-Squared          0.790       AIC                156.010 
MAE                     1.921       SBC                161.873 
---------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 
 AIC: Akaike Information Criteria 
 SBC: Schwarz Bayesian Criteria 

                               ANOVA                                 
--------------------------------------------------------------------
                Sum of                                              
               Squares        DF    Mean Square      F         Sig. 
--------------------------------------------------------------------
Regression     934.875         2        467.438    70.908    0.0000 
Residual       191.172        29          6.592                     
Total         1126.047        31                                    
--------------------------------------------------------------------

                                  Parameter Estimates                                    
----------------------------------------------------------------------------------------
      model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
----------------------------------------------------------------------------------------
(Intercept)    39.686         1.715                 23.141    0.000    36.179    43.194 
         wt    -3.191         0.757       -0.518    -4.216    0.000    -4.739    -1.643 
        cyl    -1.508         0.415       -0.447    -3.636    0.001    -2.356    -0.660 
----------------------------------------------------------------------------------------

Explanation:

  • AIC (Akaike Information Criterion): Balances model fit with complexity. Lower AIC indicates a better model.
  • BIC (Bayesian Information Criterion): Similar to AIC but imposes a heavier penalty for additional parameters, favoring simpler models. Lower BIC indicates a better model.

By comparing AIC and BIC, we can select a model that adequately fits the data without unnecessary complexity.

Step 4: Residuals vs. Fitted Plot

Plot the residuals versus fitted values to assess the assumptions of linearity and homoscedasticity.

Code
# Plot Residuals vs Fitted for the selected model (AIC)
ggplot(model_aic$model, aes(x = fitted(model_aic$model), y = resid(model_aic$model))) +
  geom_point(color = "blue") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(title = "Residuals vs Fitted (AIC Selected Model)",
       x = "Fitted Values",
       y = "Residuals") +
  theme_minimal()

Step 5: Plot Histograms of Residuals

Visualize the distribution of residuals to assess normality.

Code
# Histogram of Residuals for the selected model (AIC)
ggplot(model_aic$model, aes(x = resid(model_aic$model))) +
  geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals (AIC Selected Model)",
       x = "Residuals",
       y = "Frequency") +
  theme_minimal()

5 Conclusion

  • Takeaway 1: Multiple linear regression captures more complex relationships by including additional predictors, reducing omitted variable bias.
  • Takeaway 2: Coefficients in multiple regression represent partial effects, showing how the dependent variable changes with one predictor while controlling for others.
  • Takeaway 3: Model selection and fit criteria (like AIC, BIC, and stepwise methods) can guide which predictors to include, but practical judgment and theoretical considerations remain essential.